Overview

Dataset statistics

Number of variables18
Number of observations23856
Missing cells182
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.9 MiB
Average record size in memory259.5 B

Variable types

NUM15
CAT2
BOOL1

Reproduction

Analysis started2020-06-08 04:08:31.108343
Analysis finished2020-06-08 04:09:17.135078
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
INCIDENT_ID has a high cardinality: 23856 distinct values High cardinality
DATE has a high cardinality: 9121 distinct values High cardinality
X_3 is highly correlated with X_2High Correlation
X_2 is highly correlated with X_3High Correlation
X_10 is highly skewed (γ1 = 34.9427132) Skewed
X_12 is highly skewed (γ1 = 30.61908319) Skewed
DATE only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
X_1 has 19036 (79.8%) zeros Zeros
X_4 has 3335 (14.0%) zeros Zeros
X_5 has 4695 (19.7%) zeros Zeros
X_7 has 3461 (14.5%) zeros Zeros
X_8 has 8774 (36.8%) zeros Zeros
X_11 has 2553 (10.7%) zeros Zeros
X_12 has 5171 (21.7%) zeros Zeros
X_14 has 288 (1.2%) zeros Zeros
X_15 has 1017 (4.3%) zeros Zeros

Variables

INCIDENT_ID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count23856
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size186.5 KiB
CR_12841
 
1
CR_34689
 
1
CR_127496
 
1
CR_127201
 
1
CR_157488
 
1
Other values (23851)
23851
ValueCountFrequency (%) 
CR_12841 1 < 0.1%
 
CR_34689 1 < 0.1%
 
CR_127496 1 < 0.1%
 
CR_127201 1 < 0.1%
 
CR_157488 1 < 0.1%
 
CR_96357 1 < 0.1%
 
CR_67148 1 < 0.1%
 
CR_154505 1 < 0.1%
 
CR_106890 1 < 0.1%
 
CR_66045 1 < 0.1%
 
Other values (23846) 23846 > 99.9%
 

Length

Max length9
Mean length8.44714118
Min length4
ValueCountFrequency (%) 
Decimal_Number 10 76.9%
 
Uppercase_Letter 2 15.4%
 
Connector_Punctuation 1 7.7%
 
ValueCountFrequency (%) 
Common 11 84.6%
 
Latin 2 15.4%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

DATE
Categorical

HIGH CARDINALITY
TYPE DATE
UNIFORM
Distinct count9121
Unique (%)38.2%
Missing0
Missing (%)0.0%
Memory size186.5 KiB
12-SEP-01
 
22
13-SEP-01
 
20
17-SEP-01
 
17
11-SEP-01
 
15
15-SEP-01
 
15
Other values (9116)
23767
ValueCountFrequency (%) 
12-SEP-01 22 0.1%
 
13-SEP-01 20 0.1%
 
17-SEP-01 17 0.1%
 
11-SEP-01 15 0.1%
 
15-SEP-01 15 0.1%
 
26-SEP-01 13 0.1%
 
16-SEP-01 12 0.1%
 
19-SEP-01 11 < 0.1%
 
02-NOV-00 11 < 0.1%
 
20-SEP-01 11 < 0.1%
 
Other values (9111) 23709 99.4%
 

Length

Max length9
Mean length9
Min length9
ValueCountFrequency (%) 
Uppercase_Letter 19 63.3%
 
Decimal_Number 10 33.3%
 
Dash_Punctuation 1 3.3%
 
ValueCountFrequency (%) 
Latin 19 63.3%
 
Common 11 36.7%
 
ValueCountFrequency (%) 
ASCII 30 100.0%
 

X_1
Real number (ℝ≥0)

ZEROS
Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.483777666
Minimum0
Maximum7
Zeros19036
Zeros (%)79.8%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum7
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.439737889
Coefficient of variation (CV)2.976032152
Kurtosis13.65891063
Mean0.483777666
Median Absolute Deviation (MAD)0.7720650277
Skewness3.789307148
Sum11541
Variance2.072845188
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5.5 6.5 7. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 19036 79.8%
 
1 3497 14.7%
 
7 876 3.7%
 
5 270 1.1%
 
3 136 0.6%
 
4 26 0.1%
 
2 10 < 0.1%
 
6 5 < 0.1%
 
ValueCountFrequency (%) 
0 19036 79.8%
 
1 3497 14.7%
 
2 10 < 0.1%
 
3 136 0.6%
 
4 26 0.1%
 
ValueCountFrequency (%) 
7 876 3.7%
 
6 5 < 0.1%
 
5 270 1.1%
 
4 26 0.1%
 
3 136 0.6%
 

X_2
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count52
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.79120557
Minimum0
Maximum52
Zeros22
Zeros (%)0.1%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile4
Q17
median24
Q336
95-th percentile49
Maximum52
Range52
Interquartile range (IQR)29

Descriptive statistics

Standard deviation15.24023098
Coefficient of variation (CV)0.6147434395
Kurtosis-1.30551524
Mean24.79120557
Median Absolute Deviation (MAD)13.26368974
Skewness-0.0947521072
Sum591419
Variance232.2646403
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1.5 2.5 3.5 4.5 ... 48.5 49.5 50.5 51.5 52. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4 4029 16.9%
 
36 2232 9.4%
 
33 2174 9.1%
 
24 1344 5.6%
 
21 1254 5.3%
 
37 962 4.0%
 
49 927 3.9%
 
45 908 3.8%
 
3 778 3.3%
 
22 672 2.8%
 
Other values (42) 8576 35.9%
 
ValueCountFrequency (%) 
0 22 0.1%
 
1 20 0.1%
 
2 116 0.5%
 
3 778 3.3%
 
4 4029 16.9%
 
ValueCountFrequency (%) 
52 19 0.1%
 
51 103 0.4%
 
50 160 0.7%
 
49 927 3.9%
 
48 55 0.2%
 

X_3
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count52
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.6374497
Minimum0
Maximum52
Zeros20
Zeros (%)0.1%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile4
Q18
median24
Q335
95-th percentile49
Maximum52
Range52
Interquartile range (IQR)27

Descriptive statistics

Standard deviation15.1350925
Coefficient of variation (CV)0.6143124669
Kurtosis-1.237143987
Mean24.6374497
Median Absolute Deviation (MAD)12.98383111
Skewness-0.08212039854
Sum587751
Variance229.071025
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1.5 2.5 3.5 4.5 ... 48.5 49.5 50.5 51.5 52. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4 4029 16.9%
 
34 2232 9.4%
 
32 2174 9.1%
 
24 1344 5.6%
 
23 1254 5.3%
 
37 962 4.0%
 
49 927 3.9%
 
45 908 3.8%
 
2 778 3.3%
 
22 672 2.8%
 
Other values (42) 8576 35.9%
 
ValueCountFrequency (%) 
0 20 0.1%
 
1 22 0.1%
 
2 778 3.3%
 
3 116 0.5%
 
4 4029 16.9%
 
ValueCountFrequency (%) 
52 19 0.1%
 
51 160 0.7%
 
50 103 0.4%
 
49 927 3.9%
 
48 641 2.7%
 

X_4
Real number (ℝ≥0)

ZEROS
Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.276743796
Minimum0
Maximum10
Zeros3335
Zeros (%)14.0%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q36
95-th percentile10
Maximum10
Range10
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.944672067
Coefficient of variation (CV)0.6885313238
Kurtosis-1.013239087
Mean4.276743796
Median Absolute Deviation (MAD)2.588557087
Skewness0.1833932631
Sum102026
Variance8.671093584
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 4.5 5.5 6.5 8. 9.5 10. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
6 5497 23.0%
 
2 4791 20.1%
 
0 3335 14.0%
 
7 2890 12.1%
 
4 2027 8.5%
 
3 1871 7.8%
 
9 1360 5.7%
 
10 1242 5.2%
 
1 841 3.5%
 
5 2 < 0.1%
 
ValueCountFrequency (%) 
0 3335 14.0%
 
1 841 3.5%
 
2 4791 20.1%
 
3 1871 7.8%
 
4 2027 8.5%
 
ValueCountFrequency (%) 
10 1242 5.2%
 
9 1360 5.7%
 
7 2890 12.1%
 
6 5497 23.0%
 
5 2 < 0.1%
 

X_5
Real number (ℝ≥0)

ZEROS
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.455608652
Minimum0
Maximum5
Zeros4695
Zeros (%)19.7%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.963094729
Coefficient of variation (CV)0.7994330562
Kurtosis-1.558871205
Mean2.455608652
Median Absolute Deviation (MAD)1.798653054
Skewness0.1752310231
Sum58581
Variance3.853740916
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 4. 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
5 7368 30.9%
 
1 6818 28.6%
 
3 4973 20.8%
 
0 4695 19.7%
 
2 2 < 0.1%
 
ValueCountFrequency (%) 
0 4695 19.7%
 
1 6818 28.6%
 
2 2 < 0.1%
 
3 4973 20.8%
 
5 7368 30.9%
 
ValueCountFrequency (%) 
5 7368 30.9%
 
3 4973 20.8%
 
2 2 < 0.1%
 
1 6818 28.6%
 
0 4695 19.7%
 

X_6
Real number (ℝ≥0)

Distinct count19
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.15417505
Minimum1
Maximum19
Zeros0
Zeros (%)0.0%
Memory size186.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile15
Maximum19
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.471756047
Coefficient of variation (CV)0.7266215229
Kurtosis0.03760850344
Mean6.15417505
Median Absolute Deviation (MAD)3.459516952
Skewness0.9608294397
Sum146814
Variance19.99660214
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 2.5 3.5 4.5 ... 15.5 16.5 17.5 18.5 19. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 3461 14.5%
 
5 2679 11.2%
 
6 2629 11.0%
 
4 2319 9.7%
 
15 2318 9.7%
 
2 2298 9.6%
 
7 2286 9.6%
 
3 1708 7.2%
 
8 1405 5.9%
 
9 1267 5.3%
 
Other values (9) 1486 6.2%
 
ValueCountFrequency (%) 
1 3461 14.5%
 
2 2298 9.6%
 
3 1708 7.2%
 
4 2319 9.7%
 
5 2679 11.2%
 
ValueCountFrequency (%) 
19 2 < 0.1%
 
18 162 0.7%
 
17 110 0.5%
 
16 620 2.6%
 
15 2318 9.7%
 

X_7
Real number (ℝ≥0)

ZEROS
Distinct count19
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.876509054
Minimum0
Maximum18
Zeros3461
Zeros (%)14.5%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q37
95-th percentile12
Maximum18
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.881930665
Coefficient of variation (CV)0.7960470538
Kurtosis0.493689765
Mean4.876509054
Median Absolute Deviation (MAD)3.131351405
Skewness0.7961675929
Sum116334
Variance15.06938569
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 3.5 4.5 ... 12.5 13.5 15.5 17.5 18. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3461 14.5%
 
6 2679 11.2%
 
4 2629 11.0%
 
2 2319 9.7%
 
10 2318 9.7%
 
7 2298 9.6%
 
1 2286 9.6%
 
5 1708 7.2%
 
3 1405 5.9%
 
8 1267 5.3%
 
Other values (9) 1486 6.2%
 
ValueCountFrequency (%) 
0 3461 14.5%
 
1 2286 9.6%
 
2 2319 9.7%
 
3 1405 5.9%
 
4 2629 11.0%
 
ValueCountFrequency (%) 
18 139 0.6%
 
17 200 0.8%
 
16 210 0.9%
 
15 25 0.1%
 
14 18 0.1%
 

X_8
Real number (ℝ≥0)

ZEROS
Distinct count24
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9724597586
Minimum0
Maximum99
Zeros8774
Zeros (%)36.8%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile3
Maximum99
Range99
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.453144468
Coefficient of variation (CV)1.494297789
Kurtosis952.9615467
Mean0.9724597586
Median Absolute Deviation (MAD)0.7153220927
Skewness17.70384903
Sum23199
Variance2.111628843
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 6.5 10.5 15.5 21.5 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 11010 46.2%
 
0 8774 36.8%
 
2 2268 9.5%
 
3 967 4.1%
 
4 404 1.7%
 
5 207 0.9%
 
6 79 0.3%
 
7 33 0.1%
 
8 32 0.1%
 
10 23 0.1%
 
Other values (14) 59 0.2%
 
ValueCountFrequency (%) 
0 8774 36.8%
 
1 11010 46.2%
 
2 2268 9.5%
 
3 967 4.1%
 
4 404 1.7%
 
ValueCountFrequency (%) 
99 1 < 0.1%
 
50 1 < 0.1%
 
30 1 < 0.1%
 
29 1 < 0.1%
 
22 1 < 0.1%
 

X_9
Real number (ℝ≥0)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.924128102
Minimum0
Maximum6
Zeros118
Zeros (%)0.5%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile2
Q15
median5
Q36
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.362624612
Coefficient of variation (CV)0.276724038
Kurtosis1.28166232
Mean4.924128102
Median Absolute Deviation (MAD)0.9247586669
Skewness-1.525286754
Sum117470
Variance1.856745834
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5.5 6. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
5 10559 44.3%
 
6 9508 39.9%
 
2 3040 12.7%
 
3 452 1.9%
 
1 175 0.7%
 
0 118 0.5%
 
4 4 < 0.1%
 
ValueCountFrequency (%) 
0 118 0.5%
 
1 175 0.7%
 
2 3040 12.7%
 
3 452 1.9%
 
4 4 < 0.1%
 
ValueCountFrequency (%) 
6 9508 39.9%
 
5 10559 44.3%
 
4 4 < 0.1%
 
3 452 1.9%
 
2 3040 12.7%
 

X_10
Real number (ℝ≥0)

SKEWED
Distinct count24
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.244802146
Minimum1
Maximum90
Zeros0
Zeros (%)0.0%
Memory size186.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum90
Range89
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.119300682
Coefficient of variation (CV)0.8991795888
Kurtosis2190.137157
Mean1.244802146
Median Absolute Deviation (MAD)0.4145299924
Skewness34.9427132
Sum29696
Variance1.252834017
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 2.5 3.5 4.5 6.5 10.5 21. 90. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 20198 84.7%
 
2 2695 11.3%
 
3 549 2.3%
 
4 225 0.9%
 
5 71 0.3%
 
6 54 0.2%
 
8 15 0.1%
 
10 14 0.1%
 
9 7 < 0.1%
 
7 7 < 0.1%
 
Other values (14) 21 0.1%
 
ValueCountFrequency (%) 
1 20198 84.7%
 
2 2695 11.3%
 
3 549 2.3%
 
4 225 0.9%
 
5 71 0.3%
 
ValueCountFrequency (%) 
90 1 < 0.1%
 
58 1 < 0.1%
 
50 1 < 0.1%
 
40 1 < 0.1%
 
30 1 < 0.1%
 

X_11
Real number (ℝ≥0)

ZEROS
Distinct count133
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206.9545188
Minimum0
Maximum332
Zeros2553
Zeros (%)10.7%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1174
median249
Q3249
95-th percentile316
Maximum332
Range332
Interquartile range (IQR)75

Descriptive statistics

Standard deviation93.03334801
Coefficient of variation (CV)0.4495352339
Kurtosis0.1944049511
Mean206.9545188
Median Absolute Deviation (MAD)74.2161433
Skewness-0.9032002688
Sum4937107
Variance8655.203842
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 20.5 23. 41. ... 326. 327.5 328.5 331. 332. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
174 7275 30.5%
 
249 6930 29.0%
 
316 4500 18.9%
 
0 2553 10.7%
 
303 438 1.8%
 
127 304 1.3%
 
74 207 0.9%
 
179 206 0.9%
 
102 122 0.5%
 
263 103 0.4%
 
Other values (123) 1218 5.1%
 
ValueCountFrequency (%) 
0 2553 10.7%
 
1 3 < 0.1%
 
6 1 < 0.1%
 
11 5 < 0.1%
 
12 1 < 0.1%
 
ValueCountFrequency (%) 
332 3 < 0.1%
 
330 29 0.1%
 
329 21 0.1%
 
328 79 0.3%
 
327 1 < 0.1%
 

X_12
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count23
Unique (%)0.1%
Missing182
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean0.9740643744
Minimum0
Maximum90
Zeros5171
Zeros (%)21.7%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile2
Maximum90
Range90
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.167725118
Coefficient of variation (CV)1.198817192
Kurtosis1880.955431
Mean0.9740643744
Median Absolute Deviation (MAD)0.425520561
Skewness30.61908319
Sum23060
Variance1.363581951
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 15674 65.7%
 
0 5171 21.7%
 
2 2039 8.5%
 
3 476 2.0%
 
4 176 0.7%
 
5 59 0.2%
 
6 36 0.2%
 
8 9 < 0.1%
 
10 7 < 0.1%
 
9 6 < 0.1%
 
Other values (13) 21 0.1%
 
(Missing) 182 0.8%
 
ValueCountFrequency (%) 
0 5171 21.7%
 
1 15674 65.7%
 
2 2039 8.5%
 
3 476 2.0%
 
4 176 0.7%
 
ValueCountFrequency (%) 
90 1 < 0.1%
 
58 1 < 0.1%
 
50 1 < 0.1%
 
40 1 < 0.1%
 
30 1 < 0.1%
 

X_13
Real number (ℝ≥0)

Distinct count60
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean85.23738263
Minimum0
Maximum116
Zeros1
Zeros (%)< 0.1%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile18
Q172
median98
Q3103
95-th percentile112
Maximum116
Range116
Interquartile range (IQR)31

Descriptive statistics

Standard deviation27.59722639
Coefficient of variation (CV)0.3237690499
Kurtosis1.093046857
Mean85.23738263
Median Absolute Deviation (MAD)21.83755548
Skewness-1.388636749
Sum2033423
Variance761.6069043
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1.5 4.5 8.5 9.5 ... 111.5 112.5 113.5 115.5 116. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
103 6995 29.3%
 
72 4476 18.8%
 
92 3255 13.6%
 
112 2116 8.9%
 
98 1366 5.7%
 
18 851 3.6%
 
109 537 2.3%
 
24 523 2.2%
 
12 427 1.8%
 
59 348 1.5%
 
Other values (50) 2962 12.4%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 5 < 0.1%
 
2 210 0.9%
 
7 1 < 0.1%
 
8 2 < 0.1%
 
ValueCountFrequency (%) 
116 288 1.2%
 
115 21 0.1%
 
114 16 0.1%
 
113 225 0.9%
 
112 2116 8.9%
 

X_14
Real number (ℝ≥0)

ZEROS
Distinct count62
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.67429577
Minimum0
Maximum142
Zeros288
Zeros (%)1.2%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile29
Q129
median62
Q3107
95-th percentile142
Maximum142
Range142
Interquartile range (IQR)78

Descriptive statistics

Standard deviation43.2973203
Coefficient of variation (CV)0.5957721342
Kurtosis-1.324908795
Mean72.67429577
Median Absolute Deviation (MAD)38.55009015
Skewness0.2455877663
Sum1733718
Variance1874.657945
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1. 4. 9. 13. ... 137. 138.5 139.5 141. 142. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
29 8165 34.2%
 
93 3110 13.0%
 
142 2714 11.4%
 
62 2474 10.4%
 
80 1488 6.2%
 
130 1205 5.1%
 
107 734 3.1%
 
14 657 2.8%
 
119 579 2.4%
 
103 506 2.1%
 
Other values (52) 2224 9.3%
 
ValueCountFrequency (%) 
0 288 1.2%
 
2 1 < 0.1%
 
6 119 0.5%
 
12 1 < 0.1%
 
14 657 2.8%
 
ValueCountFrequency (%) 
142 2714 11.4%
 
140 74 0.3%
 
139 10 < 0.1%
 
138 137 0.6%
 
136 66 0.3%
 

X_15
Real number (ℝ≥0)

ZEROS
Distinct count28
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.46474681
Minimum0
Maximum50
Zeros1017
Zeros (%)4.3%
Memory size186.5 KiB

Quantile statistics

Minimum0
5-th percentile23
Q134
median34
Q334
95-th percentile46
Maximum50
Range50
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.38683369
Coefficient of variation (CV)0.2506169772
Kurtosis8.7395923
Mean33.46474681
Median Absolute Deviation (MAD)3.668170566
Skewness-2.527453789
Sum798335
Variance70.33897934
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 2. 8.5 10.5 16.5 ... 39.5 42. 44.5 49. 50. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
34 18947 79.4%
 
43 1503 6.3%
 
0 1017 4.3%
 
46 668 2.8%
 
23 642 2.7%
 
48 521 2.2%
 
36 182 0.8%
 
50 145 0.6%
 
9 92 0.4%
 
39 54 0.2%
 
Other values (18) 85 0.4%
 
ValueCountFrequency (%) 
0 1017 4.3%
 
4 4 < 0.1%
 
5 1 < 0.1%
 
8 1 < 0.1%
 
9 92 0.4%
 
ValueCountFrequency (%) 
50 145 0.6%
 
48 521 2.2%
 
46 668 2.8%
 
43 1503 6.3%
 
41 6 < 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size186.5 KiB
1
22788
0
 
1068
ValueCountFrequency (%) 
1 22788 95.5%
 
0 1068 4.5%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

INCIDENT_IDDATEX_1X_2X_3X_4X_5X_6X_7X_8X_9X_10X_11X_12X_13X_14X_15MULTIPLE_OFFENSE
0CR_10265904-JUL-040363421561611741.09229360
1CR_18975218-JUL-17137370011171612361.0103142341
2CR_18463715-MAR-1703235102311741.011093341
3CR_13907113-FEB-090333221711612491.07229341
4CR_10933513-APR-050333221830511740.011229431
5CR_9626307-APR-0304545103101613031.07262341
6CR_13140022-JAN-080303573710511740.011229431
7CR_1198114-MAY-9308773980513161.07262341
8CR_18413421-AUG-160494965831113161.010314341
9CR_3263425-AUG-961446515100521451.010329340

Last rows

INCIDENT_IDDATEX_1X_2X_3X_4X_5X_6X_7X_8X_9X_10X_11X_12X_13X_14X_15MULTIPLE_OFFENSE
23846CR_7972417-SEP-01136342115100512491.092130341
23847CR_3803302-MAY-960333221561622492.010393341
23848CR_1438402-DEC-930262790350512491.0112130341
23849CR_6895325-APR-007252590980512491.07293341
23850CR_3320111-JUL-96044651026101.07229341
23851CR_8899111-JAN-02147487315101511740.09829341
23852CR_4636905-FEB-970333221560511740.011229431
23853CR_15755603-APR-120252590351611740.01029181
23854CR_10318025-JAN-040393965271611270.0112103431
23855CR_2257508-NOV-947363421980512491.09229341